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Abstract 

Many recent analyses for conventional imperative programs begin by transforming programs 
into logic programs, capitalising on existing LP analyses and simple LP semantics. We propose 
using logic programs as an intermediate program representation throughout the compilation 
process. With restrictions ensuring determinism and single-modedness, a logic program can 
easily be transformed to machine language or other low-level language, while maintaining the 
simple semantics that makes it suitable as a language for program analysis and transformation. 
We present a simple LP language that enforces determinism and single-modedness, and show 
that it makes a convenient program representation for analysis and transformation. 

KEYWORDS: compilers, control flow graphs, intermediate representation, program analysis and 
transformation, SSA 


1 Introduction 

Most compilers, regardless of the programming language(s) and paradigms supported, 
use some Intermediate Representation (IR) between parsing the input program and emit¬ 
ting the object code. Use of an IR has the significant advantage of allowing a compiler 
to target multiple CPU architectures, and even multiple programming languages, with¬ 
out duplicating the bulk of the compiler, which operates exclusively on the IR. Over the 
course of the compilation, this representation will be analysed for different characteristics 
and transformed in various semantics-preserving ways, in preparation for efficient object 
code generation. Thus it is important for an IR to make program analysis and transfor¬ 
mation as simple and convenient as possible. Three-address eode has been a popular form 

* This work was supported by the Australian Research Council through Discovery Project Grant 
DP140102194. 
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Fig. 1. A three-address code language 

Block —>■ BlockID : Phi*Prim*BlockExit 
Phi —>■ Var = ip{Var*) 


Fig. 2. Changes to three-address language to produce SSA 


for this purpose for many years. Figure [T] presents a three-address code language. Here we 
assume we are given Name, the set of all possible function names; Var, the set of variable 
names; and Gonst, the set of all primitive constant values. We let Primval = VarU Gonst. 

To simplify exposition, we let 0 stand for all primitive arithmetic and logical operators, 
and @ stand for all primitive binary comparison operators. 

Each basic block of a function is a sequence of function calls and primitive instructions, 
ending with a control transfer to another basic block. Once control enters a basic block, 
it is guaranteed to reach its end (unless some exceptional circumstance arises). This 
guarantee makes analysis of each basic block straightforward. 

A popular variant of three-address code is Static Single Assignment (SSA) form ( |Alpern et al. 1988[ 
[Cytron et al. 199lj ILattner and Adve 2004^ . SSA was proposed as a way to generalize 
value numbering, a technique used to remove redundant computation. In SSA form, each 
variable is assigned at most once in its scope. Where a variable would be reassigned, a 
new variable is instead introduced. Since each variable is only assigned once, it is not 
necessary to consider the program point when referring to a variable, only the function 
it appears in. This makes many analyses simpler and more efficient, because a single 
abstract value can be associated with each variable name in a function, and the set of 
variable names of interest is limited and easily determined. 

A basic block with multiple predecessors presents a complication for SSA: a variable 
use in such a block may refer to definitions of those variables in any of the predecessor 
blocks. To give such a variable a single definition, SSA introduces the concept of a 
node: the variable is assigned the result of a “fake” function that takes as input the 
version of the variable from each predecessor block. A block with multiple predecessors 
will contain as many ip nodes as it has variables with alternative definitions in earlier 
blocks. Figure [2] presents the changes to three-address syntax needed to transform to 
SSA: each block may begin with ip nodes. Consider, for example, the C code to compute 
the greatest common divisor shown in Figure [3] (left side). This code can be converted 
into SSA form as shown in Figure [3] (right). 

Several researchers have presented program analyses that work by first transforming an 
imperative source program {e.g.. 


Spoto et al. (2010) and Albert et al. (2012)), or Java 


bytecode {e.g., Benton and Fischer (20071) into an abstract form based on the constructs 
of logic programming, and then analysing this result. Others {e.g., Whaley et al. (20051) 
have used logic programs to represent program analyses. In some cases this benefits from 
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int gcdCint a, int b) { 
while (b != 0) •[ 
int t = b; 
b = a "/o t; 
a = t; 

} 

return a; 



Fig. 3. The gcd function in C (left) and LLVM-style SSA form (right) 


existing logic program analyses, but the greater benefit derives from the simple, tradi¬ 
tional Tp semantics for logic programs, and hence simpler and more powerful analyses. 
Logic programs have none of the limitations of SSA form that we detail below. 

In this paper we propose representing an imperative source program as a logic pro¬ 
gram throughout the compilation process. It may be surprising to think of compiling C 
programs by translation to Prolog, rather than the reverse, but we show that placing a 
few limitations on the generated logic programs leaves low-level programs suitable for 
high-level analysis and transformation, and also for final translation to machine language. 

In Section [2] we discuss problematic aspects of SSA and related forms, together with 
suggested ways of addressing the problem. In Section[3]we introduce “Logic Programming 
(LP) Form” and we show how to translate a three-address code to it. In Sections |4] and 
[5] we give example analyses for LP form. In Section |6] we discuss related work. Finally, 
Section [3 reviews what has been achieved with the proposed LP form, and concludes. 


2 SSA and Allied Forms: Problems and Solutions 


While SSA form does simplify a number of common 
program analyses, it has significant limitations that 
interfere with others. Most of these problems can be 
solved, at the cost of further complicating the SSA 
form. In this section we will consider these limitations. 


2.1 Path obliviousness 

Basic blocks do not indicate the constraints that must 
be satisfied for them to be entered. These constraints 
appear in predecessor blocks. In a forward analysis, 
this means constraints must be propagated from con¬ 
ditional branches to their target blocks. A backward 
analysis is clumsier: it must peek backward into each 
predecessor block to see what conditions hold. 

Consider, for example, forward interval analysis of 
the code shown in Figure SI The blocks left and right 
both refer to “x” so there is no straightforward way to 



Fig. 4. SSA and branching 
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separate the reasoning that needs to be done under different assumptions about x. The 
next section considers the use of different names for x in the separate branches, which 
would help in this example. However, as it stands, we cannot assign non-trivial intervals 
to y2 and z2 in the absence of path constraints that record how control reached tail. 

To solve the path obliviousness problem, Ballance et al. (1990) have proposed the use 
of Gated Single-Assignment form (GSA). SSA’s g) nodes are replaced by different types 
of gating functions. These capture the control conditions that determine which of the 
various definitions that reach the node should provide its value. One gating function, 
7 , is in essence an if-then-else function. For example we might translate ip{xi.,X 2 ) to 
'y{P,xi,X 2 ), where P is some branch condition from elsewhere in the program. Flow of 
definitions inside loops are managed by additional gating functions to handle initial and 
loop-carried values (/i nodes) as well as loop-exiting values {rj nodes). 

This form makes information flow more transparent, but it is extremely complex, 
compared to SSA. The form we propose has greater uniformity, as it does not introduce 
a variety of different mechanisms for the “joining” or “merging” of information. Moreover, 
GSA, as SSA, does not readily lend itself to backward analysis, as discussed next. 


2.2 Forward bias 


The tp nodes of SSA are convenient when analysing each basic block, as they clearly 
indicate which variables of which other blocks provide values to the variables of the block. 
However, this assumes forward analysis. In this direction, where execution paths join, 
each variable with alternative sources is indicated by a (/? node specifying the different 
names for each alternative, and because each variable is only defined once per function, 
it naturally receives only one abstract value during forward analysis. 

For a backward analysis, however, there is no node dual to a to indicate the alterna¬ 
tive destinations that may use each variable following a branch instruction. (While the 
branch indicates alternative destinations, it does not specify the variables that may be 
used there). Importantly, the alternative destinations for a branch all have the same name 
for each variable. In a backward analysis, then, different blocks of a function may deter¬ 
mine different abstract values for the same variable name: the virtue of SSA that each 
variable has a unique definition in each function does not apply to backward analysis. 

Gonsider Figure |4] and suppose we wish to verify whether the division is safe, i.e. that 
t cannot be 0. In a fixed-width integer context (as we assume here), it is convenient to 
use “wrapped intervals” (Gange et al. 2015) as an abstract domain, as these allow us to 


capture both intervals and complemented intervals. Reasoning backwards, we find the 
following sufficient safety condition for the tail block: z0,zl ^ [1,1]- For x, this then 
translates to x ^ [1,1] (for left), and x ^ [-1,-1] (for right). This allows us to conclude 
that all will be well if x ^ 1,1], but that is insufficient to prove safety. 


To address this problem, Ananian (1999) has proposed adding a nodes to SSA form to 


create Static Single Information (SSI) form. Where SSA form has a node at the top of 
each block indicating where the value of each variable in the block comes from, SSI adds 
a a node at the bottom of each branching block indicating where each variable’s value 
goes to. This permits reasoning in both directions and provides (variable) names for all 
relevant pieces of information. However, it does not address a number of other problems, 
as we now explain. 
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2.3 Lack of yi node compositionality 

For a non-relational value analysis, which assigns each variable a single abstract value, a ip 
node conveniently specifies that the abstract value for a variable is the join of the abstract 
values of the input variables. For a relational analysis, however, a tp node does not have 
such a simple interpretation. Consider an octagon analysis (jMine 2006|) of the program 
snippet in FigurelH For the two transitions to the tail block, we have x—zO = OAyO-l-zO = 
0 and x — yl = OAj/H-zl = 0. Or, assuming SSI, we have xO > 0Aa;0 —zO = 0Ay0-|-z0 = 0 
and xl < 0 A xl — yl = 0 A yl -I- zl = 0. In either case, there is no meaningful (abstract) 
interpretation of the statement y2 = ip{y0,yl) in isolation that does not throw away 
most of these relationships. In particular, we lose the fact that y2 + z2 = 0. The two (p 
nodes must be treated together to prevent this loss of precision. What is really needed 
is a single node that conveys the information of {y2, z2) = ip{{y0, zO), (j/1, zl)). 

SSI does not help with this problem and in fact the remedies discussed so far appear to 
address particular symptoms rather than a more fundamental cause which, in our view, 
is an insufficiently abstract view of name management. 


2-4 Name management 


The ip and a nodes of SSA and SSI form require special treatment during analysis. A 
ip node V = (p{vl,v2,.. .), cannot be treated like a function call, because the variables 
mentioned come from alternative blocks—that is, they cannot exist at the same time. For 
example, when analysing a basic block beginning with v6 = (p(v3,v5), we must find the 
analysis results for the two predecessor blocks, rename v3 to v6 in the first and v5 to v6 
in the second, and then find the join of the two and project away other variables in the 
originating blocks. In essence, all ip (and tr) nodes in a block must be treated similarly to 
the way a function call is treated: information about actual parameters must be renamed 
to match formal parameters (or vice-versa for backward analysis), information about 
variables not conveyed in the call must be projected away, and the join of all incoming 
calls must be taken. 


Appel (19921 and Kelsey (19951 observed similarities between SSA and continuation¬ 


passing style in functional programming. Later Appel (19981 observed that SSA is in a 
sense equivalent to functional programming without continuations, and he presented a 
transformation from SSA to functional program (FP) form. This form mitigates the name 
management problem, using parameter passing to serve the purpose of ip nodes: where 
SSA form would have a block with a ip node for each variable defined in predecessor 
blocks, the FP form has a function with a parameter for each variable defined outside. 
Likewise, FP form uses function calls in place of jumps between blocks. Since SSA form 
supports function call and return in addition to ip nodes and jumps between blocks, 
FP form is notably simpler than SSA. So while analyses for SSA form are often only 
intra-procedural, analyses for FP form will naturally be inter-procedural as well. 


Appel’s note “SSA is functional programming” ( |Appel 1998D conveys these points very 
clearly. But a corollary is that functional form also preserves forward bias. We share the 
enthusiasm for a declarative formalism but we also point out that a relational view can 
offer greater flexibility than a functional view. 
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Fig. 5. An LP form language 
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2.5 Input/output asymmetry 

While each input parameter of a function has a unique name apparent in the function 
header, the return value cannot be determined without scanning all the function blocks. 
In fact, there may be many alternative variables returned by different blocks. This is 
inconvenient for any summarising analysis, which ultimately must project the analysis 
result for the function onto the function inputs and output. For example, in Figure [31 
one knows that al and bl are input to the function (this is omitted from the figure to 
save space), but must examine all the blocks of the function to see that al is the output. 
In fact, a function with more than one return may have many different output variables. 

These problems can be avoided by first transforming the function replacing all return 
statements with jumps to a distinguished new final block containing a ip node joining 
all return values into a new variable, which is then returned. If the function header is 
augmented to record this final variable name along with the function parameters, it would 
not be necessary to scan all blocks to find the unique return variable. This extra step is 
not difficult, but is unnecessary for LP form. 

A related inconvenience is the fact that functions can only return a single result. If, for 
example, two functions compute different values through similar computations, and the 
two are often called together, it may be desirable to fuse the two functions into a single 
one that returns two values. Of course, this may be done by returning a tuple, but in 
this case a structure is returned instead of two separate values, which may thwart many 
analyses. This can be solved by allowing functions to return multiple separate values. 


2.6 Implicit variable scoping 

While the (f nodes of a block indicate some of the defined variables on entry to the block, 
they do not indicate all of them. In fact, a block with only one predecessor will generally 
not have any ip nodes at all, and so no indication at all of which variables are defined 
on entry. Neither does SSA form provide any indication of which variables of one block 
are communicated to its successors. For analyses whose efficiency depend on minimising 
the number of variables under consideration, knowing which variables enter and leave a 
block would allow irrelevant variables to be projected away. 


3 LP Form 

SSA is a small refinement of three-address code. We argue that a larger refinement, to a 
restricted form of logic programming, provides the single-assignment benefits of SSA for 
ease of analysis while avoiding the problems outlined in Section [2] 
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Figure [ 5 ] presents a restricted Logic Programming language suitable for representing 
low-level programs^ In addition to fitting this grammar, LP form requires that for each 
guard {i.e., @) in each clause, there must be at least one other clause for the same 
procedure that is identical up to that guard, followed by the complementary guard, and 
any two clauses for a procedure must contain complementary guards, up to which they 
are identical!! Furthermore, all clause heads for a given procedure must be identical. This 
tames the nondeterminism of logic programming, ensuring that exactly one clause will 
succeed for each set of inputs, and makes analysis of procedures with multiple clauses 
simpler. That is, only one clause of each procedure will be executed, and no backtracking 
will be necessary. 

This form also tames the multiple modes of a logic program by explicitly dividing the 
arguments into inputs followed by outputs, separated by a semicolon. In calls to primitive 
as well as user defined procedures, an input argument must be either a variable or a 
constant value, and an output argument must be a variable. All parameters in procedure 
heads must be variables. This ensures that variables are free until they are assigned, 
after which they are ground. As in Mercury dSomogyi et al. 1996[ ), no dereferencing is 
ever needed. 

LP form differs from SSA form in the following ways: 

• Instead of blocks, LP form has clauses] a procedure comprises one or more clauses, 
exactly one of which will be executed. 

• Instead of conditional constructs and computed jumps, LP form has guards, instruc¬ 
tions that can either succeed or fail, determining which clause will be executed. 

• It replaces unconditional branches with procedure calls, and loops with recursion. 

• All registers (variables) in a clause are either parameters to that procedure or are 
defined in that clause, thus it has no need for ip nodes. 

• It uses parameters to pass data out of, as well as into, procedures, thus it has no 
return instruction. 

• It explicitly models changes to data structures and input/output operations, allow¬ 
ing pure functions to be recognised and optimised. SSA could do this, but, at least 
in the LLVM implementation, does not. 

• Where SSA form has four different control transfer operations, plus (f nodes, LP 
form has only procedure calls and multiple clauses, so LP form is simpler. 

One disadvantage of this representation is that the common initial parts of the clauses are 
duplicated for each clause, leading to duplicated analysis effort. Our current preliminary 
implementation factors out the duplicated code, representing a procedure body as a tree, 
with a sequence of goals at each node, and optionally a guard and two child nodes. 
This not only avoids duplicated analysis work, but also ensures that the clauses remain 
mutually exclusive and exhaustive through any program transformations. 


^ Details such as handling of type information and symbol tables are outside the scope of this paper. 

Our handling of them is similar to that of other IRs. 

^ Note that, in LP form constructed directly from three-address code, there will be at most one guard 
in a clause; however, inlining can produce clauses with multiple guards. 
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V = vars{Bo, ..Bn) (Bo,v,\6) ^ (Bo,Co,9o) ■■■ (Bn,v,\6) ^ (B'^,Cn,dn) 

H = f{p, St; ret, st6lo) Hq = fso (v, st; ret, st^o)) • • • Bn = /s„ (v, st; ret, st6l„)) 
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newvar v' 9' = 9[v i-> v'] a' = '5,9 newvar v , st^ 9' = 9\v i-> v' , st st^] a' = a9 
{v = Q{a),v,9) => (©(a';w'), true, 9') {v = g{a),v,9) => {g{a',st;v',st'), true, 9') 


newvar st^ 9' — @[st 1 —>■ st^] 

(return t!,tJ, 6) =^> (ret = v9, true, 9) (goto B,v,9) => (/s(TJ, st; ret, st^), true, 0^) 


newproc v Ct = v(v, st; ret, st') <— Vi © Vj A fst (f, st; ret, st') 
Cf = u{v, st; ret, st') •<— -^Vi © Vj A fsf (v, st; ret, st') 

(if (vi © Vj) Bt Bf,v, 9) => (v(v, st; ret, st'), Ct A Cf,9) 


Fig. 6. Translation from three-address code to LP form 


3.1 Translation to LP form 

To simplify exposition, we assume the source program is presented in three-address code 
formH We denote by t; a sequence of the 0 or more variables comprising the set v. 

To track side-effects, our translation uses the distinguished variable st to denote the 
state of the computation, including the heap and input/output state. This ensures opera¬ 
tions that may have side-effects will be executed in the correct order, while allowing pure 
operations to be reordered. We also use ret to hold the value returned by the function. 

Figure[6]presents our translation. Here the notation $ => T indicates that the function 
definition $ is transformed to the clauses T. In the remaining transforms, the notation 
($, V, 9) =► ('h, C, O') means that, in the context of substitution 9 and variables v, state¬ 
ments $ are translated to goals 'k, with extra clauses C and resulting substitution 9'. 
The substitutions are used to ensure each variable has a single assignment, and the extra 
clauses are for auxiliary predicates generated to implement conditionals. We let newvar x 
and newproc x specify that a: is a fresh variable or procedure name, respectively. 

As indicated by the first transform, each basic block is transformed into a single clause 
procedure, with one extra clause to invoke the first. For simplicity, each of these clauses 
takes all the variables appearing in the function, plus the state variable st as inputs, and 
the return value variable ret and the state, as modified by the block body, as outputs. The 
final transform produces a two-clause procedure for each conditional primitive. Because 
these transforms are idempotent and non-overlapping, confluence is assured. 

Figure [7] shows the gcd function of Figure |3] translated to LP form. The transformation 
is rather simple-minded, threading every variable to each clause. However, the neededness 
analysis described in Section [5] allows the removal of unnecessary variable threading. 


® Because variables in LP form are scoped to a single clause, rather than to all the blocks of a function 
body, translation from SSA is actually less convenient than from three-address code. 
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gcd{a, b, st; ret, st') ^ gcdheaderia, b, t, st; ret, st') 

gcdheader{a, 6, t, St; ret, st') gcd„{a, b, t, st; ret, st') 

gcdv{a, b, t, st; ret, st') 6 7 ^ 0 A gcdi,odyia, b, t, st; ret, st') 

gcd^{a, b, t, st; ret, st') 6 = 0 A gcdtaii{a, 6 ', t, st; ret, st') 

gcdhody{a, b, t, st; ret, st') ^ t' = b f\ mod(a, t'^b') A a = t' f\ gcdheader{a , b',t' , st; ret, st') 

gcdtaii{a, b, t, st; ret, st) ret = a 

Fig. 7. The gcd program translated to LP form 

gcd{a, b; ret) b 7 ^ 0 A mod(a, b; b') A gcd{b, b'; ret) 
gcd {a, b; ret) <— b = 0 A ret = a 

Fig. 8. The translated gcd program of Figure [3 after simplification 

and a simple inlining heuristic can remove unnecessary procedures. Figure [8] shows the 
translated gcd program after these transformations. 


3.2 Translation from LP form to machine language 

The Mercury project ( [Somogyi et al. 1996[ ) has demonstrated that logic programs can be 
translated to very efficient executable code by tracking predicate determinism at compile¬ 
time and eliminating variable dereferencing. LP form likewise eschews unihcation of “logic 
variables” and the need for dereferencing, but goes further, eliminating nondeterminism 
and the need for choicepoints and a machine register to track them. Since LP form is 
designed to be suitable for any language, it does not provide its own memory management 
solution, and so does not need a register to control memory allocation. 

In fact, LP form is surprisingly close to the machine language of common computers. Its 
ability to express operations with multiple outputs better reflects CPU capabilities than 
the functional restriction imposed by common three-address languages. For example, the 
x86 architecture’s IDIV instruction produces both a quotient and a remainder in separate 
registers, and numerous instructions modify flags in addition to other registers; these are 
better abstracted in LP form than in three-address code. 

As mentioned above, our implementation actually factors out the common initial part 
of all the clauses for a procedure. That is, each procedure is represented as a body, which 
is a list of goals optionally ending with a test to select between two (or more) subsequent 
bodies. This representation closely matches the structure of the code to be generated: 
some straight-line code ending with a conditional branch to one alternative and a fall 
through to the other. 

The end of each clause is also easily translated through last call optimisation: if the 
final operation in a clause is a procedure call, that call is changed to an unconditional 
branch to the destination. If it is a primitive, it is followed by a return instruction. Other 
than this, machine code generation for LP form is similar to SSA or three-address code. 
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p{x,u) •<— a; < 0 A negate{x,y) A z = x Api{y,z,u) 
p{x, u)-(—x>0Ay = xA negate(x, z) Api(y,z,u) 

pi(y, z, u) •<— sub{z, l,t) A mod{y,t, u) 


Fig. 9. Example of Fig|4]in LP form 


4 LP form analysis and transformation 

In this section we show that LP form does not share the flaws discussed in Section [2l and 
discuss its other benefits. Consider again the example program of Figure IH After simpli¬ 
fication through inlining of simple procedures and elimination of unnecessary dataflow, 
this would be expressed in LP form as shown in Figure [90 When performing a forward 
interval analysis on this code, the x < 0 condition in the first clause gives the interval 
[—oo, —1] for X, [1, oo] for y, and [—oo, —1] for z prior to the call to pi. For the second 
clause, we infer [0, oo] for x, [0, oo] for y, and [—oo, 0] for z. Computing the join of the 
abstract states for the two calls to pi, we have y G [0,oo] A z G [—oo,0], so analysing 
Pi gives us y G [0,oo] A z G [—oo,0] A t G [—oo,—1] on reaching the first call to mod, 
allowing us to certify the safety of the mod operation. The path-awareness of LP form 
gives us stronger analysis results without any extra effort. 

Since each LP form clause is logically an unordered conjunction, it is equally adept at 
forward and backward analysis. Consider a backward analysis of the program of Figure |9| 
to determine the safety of modulo (division) operations. This will start with the constraint 
t 0 at the end of pi, which implies z 1 on entry to pi. Analysing the first clause 
of p backwards from its call to pi, we deduce zy^ iVxy^ iVyy^ —1 before reaching 
the x < 0 goal. Handling this goal gives usx<0—J-zy^lVxy^lVyy^—1= True, 
meaning we have nothing else to prove for that clause. Turning to the second clause of 
p, we derive a;>0—5>zy^lVa;y^—iVyyf:—1= True, and again the proof obligation 
is discharged. 

Relational analyses do not present any difficulty for LP form, because it has no artificial 
(f nodes to separately combine alternative versions of variables. This is handled through 
conventional procedure calls, where the least upper bound is used to combine results 
for multiple calls. Consider an octagon analysis ([Mine 2006P of Figure |9l Much like the 
analysis discussed in Section 12.31 analysis derives y-|-a:: = 0Az — x = 0Ay-|-z = 0 
leading to the call to pi from clause 1, and y — a:: = 0Ax-|-z = 0Ay-|-z = 0for clause 2. 
Procedure calls are handled by projecting the abstract state onto the variables appearing 
in the call, and computing the least upper bound of the states. In this case, this yields 
y + z = 0\Jy + z = 0 = y + z = 0, preserving the strong results obtained for both clauses. 

The other issues for SSA and FP form discussed in Section [2] are trivially addressed 
by LP form. Lacking ip nodes, LP form has no issue with name management. Because 
LP form is relational, it has no issue with input/output asymmetry. And because each 
clause has its own scope, the scope of each variable is obvious. 

Since the definition of pi is so simple, in practice it would be inlined, but that would only give us 

stronger analysis results. 
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5 Specialised analyses for LP form 

Liveness analysis is a standard program analysis used to determine for each program 
point the set of variables whose values may be needed later. The single assignment 
property enjoyed by SSA, FP, and LP forms somewhat simplihes this analysis: because 
each variable is assigned only once, it is not necessary to take account of variable re¬ 
assignment. Within a single block (clause) of SSA (LP form) code, this is easily done by 
traversing the statements backward, noting the first encountered use of each variable, 
which will be the last use on forward execution, and each variable assignment, which will 
be the definition of that variable. To handle liveness for a whole function, analysis results 
must be propagated backward between blocks. 

Dead code elimination is a transformation to remove unnecessary code. Any code that 
assigns only dead variables can be removed, but doing so may remove variable uses, and 
produce stronger results for liveness analysis. Thus it is beneficial to perform liveness 
analysis and dead code elimination simultaneously. If this is extended beyond individual 
functions to an entire module or even a whole program, more dead code can be eliminated. 

We present a two-phase interprocedural neededness analysis, which combines liveness 
and dead code elimination. The first phase computes neededness dependeneies, conjunc¬ 
tions of implications of the form x ^ y signifying that if variable x is needed on com¬ 
pletion of a goal, then y is needed on entry. This analysis can be performed bottom-up 
over a module’s call graph, one strongly connected component (SCC) at a time, which 
ensures that all callers of a given procedure, except those in the same SCC, will be ana¬ 
lyzed before the procedure itself. A fixed point must be computed for each SCC, but no 
iteration is necessary between SCCs. This reduces the number of procedures analyzed in 
each fixed point iteration, since SCCs are typically fairly small. 

Formally, we define our neededness dependency domain N as the set of conjunctions of 
variable —> variable implications, where an individual implication x ^ y indicates that if 
variable x is needed, then so is y. We let S denote the Goal — > N neededness dictionary 
function space, specifying neededness dependencies for many procedures. We define our 
analysis with the following functions: 

Pd-.-.V{Proc) ^ S Cd ■■■■ Goal* ^ S ^ N 

Dd :: Proc —> S Gd Goal —> V{ Var) —> S —> N 

Here Pd gives the neededness dictionary for all the procedures in the module; Dd yields 
the dictionary for a single procedure; Cd produces the neededness of a single clause given 
a neededness dictionary; and Gd gives the neededness of a single goal given the set of 
variables needed later in the clause body and a neededness dictionary. 

As shown in Figure [TOl the neededness analysis of a module is the least fixed point of 
the combination of results for all procedures in the module, and the result for a procedure 
is just the conjunction of the neededness of all its clauses, which is the conjunction of 
results for all goals in each clause. The analysis result for a primitive operation is the 
conjunction oi x ^ y implications for each output x and each input y. For a primitive 
comparison operation, it is the conjunction oi x ^ y for each variable x defined later in 
the clause (determined by the defs function) and each input y of the comparison. Since 
primitive comparisons are guards, they are only needed to determine if the following code 
is executed, so they are only needed if some variable defined later is needed. 
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Fig. 10. Neededness abstract interpretation 


The second analysis phase uses these dependencies to determine which procedure in¬ 
puts and outputs are actually used, beginning by marking all parameters of public (ex¬ 
ported) functions as needed. This analysis then proceeds top-down by SCCs through the 
program call graph, with each SCC processed until a fixed point is reached. In each iter¬ 
ation, each clause in the SCC is processed with a needed variable formula, initially the 
conjunction of the set of output variables of that procedure that are marked as needed. 

Processing of a clause proceeds from last goal to first. If any output of a goal is in the 
needed variable formula, the goal is marked as needed, and the called procedure has its 
needed outputs marked for when it is processed. Then the neededness dictionary for the 
called procedure is conjoined with the current needed variable formula, and the goal’s 
output variables are projected out, to produce the new needed variable formula. This 
formula then comprises the live variable set for that goal. Primitive goals are simpler: if 
any output is in the needed variable formula, it is marked as needed, its inputs are added 
to the needed variable formula, and its outputs are projected out. Once a fixed point has 
been reached, any goal or parameter not marked as needed can be removed. 


6 Related Work 


Many variants of SSA have been proposed (jBallance et al. lOWillCerlek et al. 1995IIChow et al. 19961 
lAnanian 1999p and much work has been concerned with how to generate (compact) 

SSA and its variants efficiently dCytron et al. 1991) ITu and Padua 19^ lAnanian 1999p . 


In Section [2] we mentioned the work on FP form by Kelsey (1995) and Appel (1998). 


Appel (1998) in fact sketches two translations to FP form, one producing a “flat” se¬ 


quence of function dehnitions, the other producing nested definitions. The latter uses 
fewer functions and variables and Appel points out that the structure of function nesting 


makes the dominance properties of the original control-flow graph explicit. Appel (1998) 


also uses the equivalent of SSI’s “a nodes” as a pedagogic tool; the a nodes are in fact 
pushed into successor blocks and become mere “renaming” (p nodes. 
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Peralta and Cruz-Carlon (2006) briefly sketched a translation from SSA to CLP, but 


provided no formal definition of the translation. From their examples it is clear that the 
translation differs from the one suggested here. Peralta et al. (19981 showed how to use 
CLP analysis tools to analyse imperative programs. Their approach is based on having 
an interpreter, written in CLP, for the imperative language and then translating (small) 
imperative programs through partial evaluation of the interpreter. 


Spoto et al. (20101 implement a termination analyzer for Java bytecode by expressing 


path-length reasoning as a CLP program and leveraging from existing CLP termination 
analysis tools. The resulting analyzer is robust and entirely automatic, covering the full 
language of Java bytecode. Albert et al. (20121 use a similar approach to cost analysis. 


Morales et al. (20151 explore the use of a logic programming language for the imple¬ 


mentation of efficient abstract machines and runtime systems. To this end they use a 
Prolog variant with certain imperative features (mutable variables) that enables transla¬ 
tion into efficient C-style code while still allowing for high-level program transformations, 
such as partial evaluation of instruction definitions. 

CLP has been also used as the basis for software model checking (jPelzanno and Podelski 19991 
[Flanagan 2003[ ) of concurrent systems and its use in software verification tools is rapidly 
growing. For example, it has been adopted in Threader ( [Gupta et al. 20111 ), UFO ( [Albarghouthi et al. 2012[ ) 
SeaHorn ([Curfinkel et al. 2015p . HSF ([Grebenshchikov et al. 20T^ . VeriMAP ( [De Angelis et al. 2014[ ) 
Eldarica (IRfimmer et al. 2013p . and TRACER (jJaffar et al. 2012| . The task of encoding 
verification conditions is different to our aim of providing a platform for program compi¬ 
lation, although both require a convenient representation for reasoning about programs. 


7 Conclusions 

We have described Static Single Assignment form, and discussed a number of problems it 
causes for sophisticated analyses. Many of these problems have been previously addressed, 
but no previous work has addressed all of them. One approach that addressed several of 
these problems re-conceives a low-level program as a functional program. 

We propose going further and viewing a low-level program as a logic program, and 
have suggested a simple, deterministic, strongly moded logic programming language as 
a compiler intermediate representation. The language is fully declarative; many existing 
analyses for logic programming languages will apply directly. We have presented a pow¬ 
erful analysis and transformation for this form. Because LP form uses procedure calls for 
all control transfer, operations that cross block boundaries are naturally inter-procedural. 
Owing to determinism and single-mode restrictions, LP form is surprisingly close to ma¬ 
chine language, so final code generation is not difficult. Thus LP form is a suitable choice 
for a compiler’s intermediate code representation. 

We are currently developing an implementation of LP form, which we call LPVM. 
This is being used as intermediate representation for a compiler we are developing for a 
language combining the benefits of declarative and imperative programming. Since the 
procedures of the language support multiple outputs, that facility in LP form is par¬ 
ticularly important. Rather than duplicating the extensive work of the LLVM project 
in producing high-quality, peep-hole optimised assembly language for multiple architec¬ 
tures, we plan to do all program analysis and transformation in LP form, and finally 
translate to LLVM for final code generation. 
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